DWS

The repository contains examples on how to use DWS in GKE. More information about DWS is available here.

Setup and Usage

Prerequisites

  • Google Cloud account set up.
  • gcloud command line tool installed and configured to use your GCP project.
  • kubectl command line utility is installed.
  • terraform command line installed.

Check out the necessary code files:

git clone https://github.com/ai-on-gke/tutorials-and-examples.git
cd tutorials-and-examples/workflow-orchestration/dws-example

Create Clusters

terraform -chdir=tf init
terraform -chdir=tf plan
terraform -chdir=tf apply -var project_id=<YOUR PROJECT ID>

Install Kueue

VERSION=v0.12.0
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml

Create Kueue resources

kubectl apply -f dws-queues.yaml 

Validate installation

Verify the Kueue installation in your GKE cluster

kubectl get clusterqueues dws-cluster-queue -o jsonpath="{range .status.conditions[?(@.type == \"Active\")]}CQ - Active: {@.status} Reason: {@.reason} Message: {@.message}{'\n'}{end}"
kubectl get admissionchecks dws-prov -o jsonpath="{range .status.conditions[?(@.type == \"Active\")]}AC - Active: {@.status} Reason: {@.reason} Message: {@.message}{'\n'}{end}"

If the installation and configuration were successful, you should see the following output:

CQ - Active: True Reason: Ready Message: Can admit new workloads
AC - Active: True Reason: Active Message: The admission check is active

Create a job

kubectl create -f job-autopilot.yaml

How Kueue and DWS work

After creating the job, you can review the provisioning request:

kubectl get provisioningrequests

You should see output similar to this:

NAME                                 ACCEPTED   PROVISIONED   FAILED   AGE
job-dws-job-bq9r9-9409b-dws-prov-1   True       False                   158m

Kueue creates the provisioning request, which is integrated with DWS. If DWS receives and accepts the request, the ACCEPTED value will be True. Then, as soon as DWS can secure access to your resources, the PROVISIONED value will change to TRUE. At that point, the node is created, and the job schedules on that node. Once the job finishes, GKE automatically releases the node.

kubectl get provisioningrequests
kubectl get nodes
kubectl get job

Continue reading: